AITopics | medical advice

Collaborating Authors

medical advice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unsupervised Identification and Removal of Spurious Correlations During Fine-Tuning

Gilligan-Lee, Ciarán M., Egan, Joseph, Zhu, Yuchen, O'Riordan, Michael

arXiv.org Machine LearningMay-28-2026

Fine-tuning a pretrained language model on a curated dataset can produce spurious correlations between the fine-tuning task and unintended latent factors -- such as misaligned personas or political slant -- that the curation procedure has entangled with the task. The model can latch onto these spurious correlations, leading to bias and reduced out-of-distribution generalisation. We prove that under reasonable assumptions on task complexity and the spurious correlation, such latent factors can be identified, without supervision, from the weights of a naive LoRA fine-tune. Existing approaches to removing bias, such as activation steering, remove identified factors from residual-stream activations, either at inference or during training. We argue, however, that the goal should be to remove the spurious correlation, not the latent factor itself, as the pretrained model may rely on it for genuine task signal. To enable this, we propose GRASP, GRadient projection of Associated Spurious Patterns, which prevents the model from acquiring new reliance on the identified latent factor while preserving any pretrained content along it. We validate on three fine-tuning tasks. The first two involve emergent misalignment, where fine-tuning on a narrow task -- in our case, writing insecure code and giving bad medical advice -- leads to misaligned responses on unrelated topics. Here our method completely removes misalignment in the insecure code case and reduces them by ~5x in the bad medical advice case, beating all baselines in the trade-off between misalignment-reduction and task-preservation. The last is a novel political-bias experiment, where fine-tuning on right-skewed Reddit financial-advice data causes political-lean drift on unrelated topics. Here our method reduces drift by more than half, while improving financial task performance, beating all baselines.

artificial intelligence, machine learning, misalignment, (17 more...)

arXiv.org Machine Learning

2605.27676

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.87)
Law (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The friendlier the AI chatbot the more inaccurate it is, study suggests

BBC NewsApr-29-2026, 15:00:06 GMT

AI chatbots trained to be warm and friendly when interacting with users may also be more prone to inaccuracies, new research suggests. Oxford Internet Institute (OII) researchers analysed more than 400,000 responses from five AI systems which had been tweaked to communicate in a more empathetic way. Friendlier answers contained more mistakes - from giving inaccurate medical advice to reaffirming user's false beliefs, the study found. The findings raise further questions over the trustworthiness of AI models, which are often deliberately designed to be warm and human-like in order to increase engagement. Such concerns are accentuated by AI chatbots being used for support and even intimacy, as developers seek to broaden their appeal.

artificial intelligence, chatbot, natural language, (13 more...)

BBC News

Country:

Europe > United Kingdom (0.51)
North America > United States (0.31)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Sports (0.43)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Mind launches inquiry into AI and mental health after Guardian investigation

The GuardianFeb-20-2026, 06:00:25 GMT

The Guardian revealed how people were being put at risk of harm by false and misleading health information in Google AI Overviews. The Guardian revealed how people were being put at risk of harm by false and misleading health information in Google AI Overviews. Exclusive: England and Wales charity to examine safeguards after Guardian exposed'very dangerous' advice on Google AI Overviews'Very dangerous': a Mind mental health expert on Google's AI summaries Mind is launching a significant inquiry into artificial intelligence and mental health after a Guardian investigation exposed how Google's AI Overviews gave people "very dangerous" medical advice. In a year-long commission, the mental health charity, which operates in England and Wales, will examine the risks and safeguards required as AI increasingly influences the lives of millions of people affected by mental health issues worldwide. The inquiry - the first of its kind globally - will bring together the world's leading doctors and mental health professionals, as well as people with lived experience, health providers, policymakers and tech companies.

ai overview, artificial intelligence, social media, (14 more...)

The Guardian

Country:

Europe > United Kingdom > Wales (0.45)
Europe > United Kingdom > England (0.45)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.73)

Add feedback

Google puts users at risk by downplaying health disclaimers under AI Overviews

The GuardianFeb-16-2026, 07:30:13 GMT

Google's AI Overviews only issue a warning if users choose to request additional health information, by selecting'Show more'. Google's AI Overviews only issue a warning if users choose to request additional health information, by selecting'Show more'. Google is putting people at risk of harm by downplaying safety warnings that its AI-generated medical advice may be wrong. When answering queries about sensitive topics such as health, the company says its AI Overviews, which appear above search results, prompt users to seek professional help, rather than relying solely on its summaries. "AI Overviews will inform people when it's important to seek out expert advice or to verify the information presented," Google has said .

ai overview, artificial intelligence, information management, (15 more...)

The Guardian

Country:

Europe (0.32)
North America > United States (0.31)

Industry:

Leisure & Entertainment > Sports (0.71)
Health & Medicine > Consumer Health (0.58)
Government > Regional Government (0.51)
Media > News (0.50)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.74)
Information Technology > Information Management > Search (0.50)

Add feedback

Dr. Bias: Social Disparities in AI-Powered Medical Guidance

Kondrup, Emma, Imouza, Anne

arXiv.org Artificial IntelligenceOct-20-2025

With the rapid progress of Large Language Models (LLMs), the general public now has easy and affordable access to applications capable of answering most health-related questions in a personalized manner. These LLMs are increasingly proving to be competitive, and now even surpass professionals in some medical capabilities. They hold particular promise in low-resource settings, considering they provide the possibility of widely accessible, quasi-free healthcare support. However, evaluations that fuel these motivations highly lack insights into the social nature of healthcare, oblivious to health disparities between social groups and to how bias may translate into LLM-generated medical advice and impact users. We provide an exploratory analysis of LLM answers to a series of medical questions spanning key clinical domains, where we simulate these questions being asked by several patient profiles that vary in sex, age range, and ethnicity. By comparing natural language features of the generated responses, we show that, when LLMs are used for medical advice generation, they generate responses that systematically differ between social groups. In particular, Indigenous and intersex patients receive advice that is less readable and more complex. We observe these trends amplify when intersectional groups are considered. Considering the increasing trust individuals place in these models, we argue for higher AI literacy and for the urgent need for investigation and mitigation by AI developers to ensure these systemic differences are diminished and do not translate to unjust patient support. Our code is publicly available on GitHub.

category, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.09162

Country:

North America > United States (1.00)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.99)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering

Li, Yahan, Yao, Jifan, Bunyi, John Bosco S., Frank, Adam C., Hwang, Angel, Liu, Ruishan

arXiv.org Artificial IntelligenceOct-2-2025

Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in mental health, where patient questions often mix symptoms, treatment concerns, and emotional needs, requiring answers that balance clinical caution with contextual sensitivity. We present CounselBench, a large-scale benchmark developed with 100 mental health professionals to evaluate and stress-test large language models (LLMs) in realistic help-seeking scenarios. The first component, CounselBench-EVAL, contains 2,000 expert evaluations of answers from GPT-4, LLaMA 3, Gemini, and human therapists on patient questions from the public forum CounselChat. Each answer is rated across six clinically grounded dimensions, with span-level annotations and written rationales. Expert evaluations show that while LLMs achieve high scores on several dimensions, they also exhibit recurring issues, including unconstructive feedback, overgeneralization, and limited personalization or relevance. Responses were frequently flagged for safety risks, most notably unauthorized medical advice. Follow-up experiments show that LLM judges systematically overrate model responses and overlook safety concerns identified by human experts. To probe failure modes more directly, we construct CounselBench-Adv, an adversarial dataset of 120 expert-authored mental health questions designed to trigger specific model issues. Evaluation of 3,240 responses from nine LLMs reveals consistent, model-specific failure patterns. Together, CounselBench establishes a clinically grounded framework for benchmarking LLMs in mental health QA.

annotator, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.08584

Country: North America > United States > California (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Randy Travis' wife defied medical advice to 'pull the plug' during country star's stroke recovery battle

FOX NewsAug-26-2025, 13:00:33 GMT

Randy Travis's wife, Mary, told Fox News Digital doctors informed her to pull the plug after the country music star's stroke in 2013. Randy Travis' wife believes there was "never a doubt" in her husband's head that he would make it through his debilitating stroke, even though doctors advised her to "pull the plug." During an interview with Fox News Digital, Mary explained a crucial moment in Travis' two-and-a-half-year health battle that stood out to her, and that was the moment doctors told her to end her husband's life. "I think Randy, there was never a doubt in Randy's mind that he could make it through it. It was that magical moment that I went to his bedside when they said, 'We need to pull the plug. He's got too many things going against him at that point.' He had gotten a staph infection and three other hospital-born bacterial viruses like Serratia, Pseudomonas, one thing after another, and the doctors were just saying, 'He just doesn't have the strength to get through this,'" Mary said.

artificial intelligence, plug, randy travis, (10 more...)

FOX News

Country:

North America > United States > Tennessee > Davidson County > Nashville (0.06)
North America > United States > District of Columbia > Washington (0.05)

Genre: Personal > Interview (0.55)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)

Technology: Information Technology > Artificial Intelligence (0.73)

Add feedback

Large language models provide unsafe answers to patient-posed medical questions

Draelos, Rachel L., Afreen, Samina, Blasko, Barbara, Brazile, Tiffany L., Chase, Natasha, Desai, Dimple Patel, Evert, Jessica, Gardner, Heather L., Herrmann, Lauren, House, Aswathy Vaikom, Kass, Stephanie, Kavan, Marianne, Khemani, Kirshma, Koire, Amanda, McDonald, Lauren M., Rabeeah, Zahraa, Shah, Amy

arXiv.org Artificial IntelligenceAug-6-2025

Millions of patients are already using large language model (LLM) chatbots for medical advice on a regular basis, raising patient safety concerns. This physician-led red-teaming study compares the safety of four publicly available chatbots--Claude by Anthropic, Gemini by Google, GPT-4o by OpenAI, and Llama3-70B by Meta--on a new dataset, HealthAdvice, using an evaluation framework that enables quantitative and qualitative analysis. In total, 888 chatbot responses are evaluated for 222 patient-posed advice-seeking medical questions on primary care topics spanning internal medicine, women's health, and pediatrics. We find statistically significant differences between chatbots. The rate of problematic responses varies from 21.6 percent (Claude) to 43.2 percent (Llama), with unsafe responses varying from 5 percent (Claude) to 13 percent (GPT-4o, Llama). Qualitative results reveal chatbot responses with the potential to lead to serious patient harm. This study suggests that millions of patients could be receiving unsafe medical advice from publicly available chatbots, and further work is needed to improve the clinical safety of these powerful tools.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.18905

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards physician-centered oversight of conversational diagnostic AI

Vedadi, Elahe, Barrett, David, Harris, Natalie, Wulczyn, Ellery, Reddy, Shashir, Ruparel, Roma, Schaekermann, Mike, Strother, Tim, Tanno, Ryutaro, Sharma, Yash, Lee, Jihyeon, Hughes, Cían, Slack, Dylan, Palepu, Anil, Freyberg, Jan, Saab, Khaled, Liévin, Valentin, Weng, Wei-Hung, Tu, Tao, Liu, Yun, Tomasev, Nenad, Kulkarni, Kavita, Mahdavi, S. Sara, Guu, Kelvin, Barral, Joëlle, Webster, Dale R., Manyika, James, Hassidim, Avinatan, Chou, Katherine, Matias, Yossi, Kohli, Pushmeet, Rodman, Adam, Natarajan, Vivek, Karthikesalingam, Alan, Stutz, David

arXiv.org Artificial IntelligenceJul-22-2025

Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians' capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.15743

Country:

North America > United States (0.46)
Europe > United Kingdom (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.92)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian

Niculae, Andrei, Cosma, Adrian, Dumitrache, Cosmin, Rǎdoi, Emilian

arXiv.org Artificial IntelligenceJul-22-2025

Text-based telemedicine has become increasingly common, yet the quality of medical advice in doctor-patient interactions is often judged more on how advice is communicated rather than its clinical accuracy. To address this, we introduce Dr. Copilot , a multi-agent large language model (LLM) system that supports Romanian-speaking doctors by evaluating and enhancing the presentation quality of their written responses. Rather than assessing medical correctness, Dr. Copilot provides feedback along 17 interpretable axes. The system comprises of three LLM agents with prompts automatically optimized via DSPy. Designed with low-resource Romanian data and deployed using open-weight models, it delivers real-time specific feedback to doctors within a telemedicine platform. Empirical evaluations and live deployment with 41 doctors show measurable improvements in user reviews and response quality, marking one of the first real-world deployments of LLMs in Romanian medical settings.

copilot, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.11299

Country:

Europe (0.28)
Asia (0.28)
North America > United States (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback